[00:01.050 --> 00:09.110]  All right, so welcome everybody and thank you for taking the time to see our talk.
[00:09.110 --> 00:17.170]  We are happy to be here at the DEF CON Red Team Village and we are very excited to share with you
[00:17.170 --> 00:28.730]  some of the findings that we've been doing since we did a talk in the previous Red Team Village
[00:28.730 --> 00:37.650]  Summit where we try to showcase and expose basically how dangerous is the leakage of
[00:37.650 --> 00:43.510]  credentials on the internet. And today me and Jose Hernandez, my colleague, are going to be presenting
[00:43.510 --> 00:50.170]  How'd My Keys Bing Pound API Edition. So let's get going.
[00:52.250 --> 00:58.710]  So most of you know who we are. I'm a Principal Security Research Engineer at Splunk.
[00:58.710 --> 01:04.950]  I worked with Jose at Prolexic, which is now Akamai. I worked at Akamai for a while. Then I went to
[01:04.950 --> 01:13.510]  Caspita and came back at one point to Splunk. I co-founded Hack Miami and Pacific Hacker Meetups
[01:13.510 --> 01:19.070]  and Conferences and I grow my own CDFs. Some of you may have Bladen, which is the Command and Control
[01:19.070 --> 01:26.250]  and of course CDF. And Jose, can you tell us a little bit about yourself? Yeah, I'm also a Principal
[01:26.250 --> 01:34.150]  Security Researcher at Splunk. I'm an old longtime friend of Brock from our Prolexic days, which got
[01:34.150 --> 01:41.130]  purchased by Akamai. I co-founded a company called Zenitch, which is now Oracle's Web Application
[01:41.130 --> 01:46.870]  Firewall and DDoS services. And returned to Splunk to do research, security research. And this is one
[01:46.870 --> 01:52.890]  of the things that we've been working on that I'm super excited about. Awesome, so let's move on.
[01:52.890 --> 01:59.250]  So just before we get into basically what we're going to show you today, it is important
[01:59.790 --> 02:07.630]  that we recap a little bit on how we got to this point. And in order to recap to
[02:08.630 --> 02:16.930]  to a point where we can understand how bad and how we got to this very bad situation, we need
[02:16.930 --> 02:25.470]  to understand or take a look at what is DevOps. So as we have approached it before and explained
[02:25.470 --> 02:32.270]  it, DevOps is a set of practices that basically is within software development and team operations.
[02:32.270 --> 02:39.010]  It has become very popular. It's not really new, but it has become very popular and it has been
[02:40.470 --> 02:50.610]  widely adopted as most companies are starting to somehow get a foot in cloud platforms.
[02:50.610 --> 02:56.770]  Some of them have radically moved most of their operations to the cloud. And when we're talking
[02:56.770 --> 03:05.390]  about developing software and producing software, building, coding it, planning it, testing it,
[03:05.390 --> 03:13.310]  releasing it, the first thing that comes to your mind is DevOps. And DevOps is a
[03:13.310 --> 03:21.630]  set of practices usually guided by some software development principles. And the most popular
[03:22.170 --> 03:28.430]  currently is Agile. And some of you might be familiar with Agile if you have worked in a
[03:28.430 --> 03:33.810]  software development company. I work for now for three software development companies and
[03:33.810 --> 03:42.490]  all of them were using Agile. And this is how I was exposed to the, not only the infrastructure,
[03:42.490 --> 03:49.870]  but the risks that are associated with this. So next slide, please. So one of the things that
[03:50.970 --> 03:58.290]  I noticed throughout the years, and me and Jose have been researching on, is that when you use
[03:58.290 --> 04:06.290]  this set of principles and divide it in things that are called tool chains that go in the DevOps
[04:06.290 --> 04:15.410]  process, there's a number of products and a number of tools that are constantly used, reused, shared
[04:17.350 --> 04:27.890]  and repurposed. One of the characteristics of the, to say in a certain way, the platform that is
[04:27.890 --> 04:36.250]  for development, software development is at times his ephemeral character, meaning for example,
[04:36.250 --> 04:43.630]  that you can create containers, destroy them and then simply recreate them again. So in order for
[04:43.630 --> 04:50.770]  us to give this a little more structure into what we can look at when we're trying to gauge the
[04:50.770 --> 04:57.310]  disposal and risks associated with DevOps, we had to look at DevOps tool chains. And DevOps tool
[04:57.310 --> 05:03.130]  chains basically are a combination of tools that aid in the delivery, development, management of
[05:03.130 --> 05:09.130]  software and applications for the entire cycle. As you saw previously, there is an actual cycle
[05:09.130 --> 05:17.370]  where you code, you build, you plan, release, you test it and then you go back again. And if you are
[05:17.370 --> 05:24.050]  for example, in a specific methodologies, for example, Agile, usually there are sprints and
[05:24.050 --> 05:30.730]  sprints are sets of times that you have to produce a number of features or bug fixes. And all of this
[05:30.730 --> 05:38.170]  is based in a flow that goes through what they call the tool chains. So in the tool chains,
[05:38.170 --> 05:43.210]  you can see there are things that are used for planning, such as Git or Jira. There are things
[05:43.210 --> 05:52.390]  that are used for coding that involve coding, code repositories. There are things used for
[05:52.390 --> 06:03.010]  testing that implicate things such as Selenium or Bagram or Docker containers, for example.
[06:03.010 --> 06:12.150]  There are things that are tools that are made for software build, such as Ansible or Terraform
[06:12.150 --> 06:18.790]  or Chaff. And then of course, there are things that are within the tool chains for what they call
[06:18.790 --> 06:25.910]  deployment, which imply things such as Kubernetes. In Docker, for example, most of these things
[06:27.010 --> 06:33.970]  like Kubernetes at times are throughout the entire process. And depending on your
[06:35.670 --> 06:45.770]  cloud provider, you may be very familiar with some orchestration automation languages such as
[06:46.490 --> 06:54.750]  Ansible or Terraform. And then finally, part of this tool chain is the monitoring part. And then
[06:54.750 --> 07:02.830]  we have things usually the main two monitoring tools are either based on EOK, which is Elastic
[07:02.830 --> 07:15.430]  Search, and of course, Blunk. Next, please. So once we have seen the picture of the two chains
[07:15.430 --> 07:24.670]  that are associated with the software development, plus the cycle throughout the flow of planning it,
[07:24.670 --> 07:28.310]  coding it, building it, testing it, releasing it, and then coming back,
[07:28.310 --> 07:34.390]  there's one thing that we're focusing today, and that is credentials. Credentials are part of the
[07:34.390 --> 07:43.330]  entire process. Credentials are needed for many reasons. And here's a little bit of some of the
[07:44.370 --> 07:52.310]  highlights of what happens in this process with credentials. So developers, for example,
[07:52.310 --> 07:59.370]  usually have high privilege credentials. Why? Because they had to be able to test things that
[07:59.370 --> 08:06.430]  would run with low privileges. They are supposed to develop things that may be
[08:07.030 --> 08:14.530]  kernel libraries, sockets that need to be created with services or connections.
[08:14.530 --> 08:24.010]  These environments at times are usually ephemeral, like I said, and as a result of that,
[08:24.010 --> 08:31.390]  many times they're dismissed and poorly monitored. We're seeing cases of developers
[08:31.390 --> 08:38.590]  downloading anything from Docker Hub or who knows what have you, a container repository on the
[08:38.590 --> 08:44.770]  internet and simply putting them in their DevOps tool chains without any checking, without any
[08:44.770 --> 08:52.010]  scanning, and now you may have implanted containers, vulnerable libraries, vulnerable
[08:52.010 --> 08:58.750]  operating systems that will eventually affect and be published into your production environment.
[09:00.070 --> 09:06.510]  Obviously, this is related as well. There's a disconnection between development and security
[09:06.510 --> 09:14.590]  operations. I've been part of this problem many times and I have interacted with developers.
[09:14.590 --> 09:19.990]  Developers usually don't like when you tell them that their code has bugs or vulnerabilities.
[09:21.270 --> 09:26.150]  Most of the times there's not a straight link in between what they're developing, so I'll give you
[09:26.150 --> 09:31.830]  an example. You have a development department and they download, for example, all these containers
[09:31.830 --> 09:39.090]  or libraries or code. Many times there are no tools to verify this. Many times there are not
[09:39.730 --> 09:45.010]  inventory of what is it that they're downloading, what is it that they're using, what libraries
[09:45.010 --> 09:55.570]  they're putting into the software packages. So this by itself is a risk. And as well,
[09:55.570 --> 10:01.130]  this is also very popular and this is part of the nature of the DevOps process.
[10:01.130 --> 10:09.550]  There's a spread use of open source tools and code. So basically, many times I notice in this
[10:09.550 --> 10:16.050]  environment that this code is trusted by default, meaning they just go, oh, I know this developer.
[10:16.050 --> 10:21.850]  Oh, I know this group. I'm just going to download this and use it in my application. And again,
[10:21.850 --> 10:28.150]  this goes back to the disconnection, but there is some sort of an honor code between the open
[10:28.150 --> 10:36.090]  source community where do no harm is always the driver of software development. But it does not
[10:36.090 --> 10:43.110]  mean, and we have seen this in supply chain attacks, that malicious actors, nation states,
[10:43.110 --> 10:49.350]  criminals in general, may target these repositories, these open source communities,
[10:49.350 --> 10:57.370]  and embed by stuff in it. Also, embedded credentials usually end up in public repositories
[10:57.370 --> 11:05.130]  and that's pretty much what we're going to show you today. How even big companies, how individuals
[11:07.390 --> 11:16.590]  that... unfortunately, there is no mechanisms. All in all, we don't blame them for what you're
[11:16.590 --> 11:22.410]  about to see. We don't think that they're purposely leaking these credentials. However,
[11:22.410 --> 11:28.310]  it is important that we point this out in order to bring awareness that there has to be
[11:28.310 --> 11:36.310]  mechanisms created in order to avoid the rampant, because there's no other name for it, this rampant
[11:36.310 --> 11:44.910]  leakage of credentials. And we will show today how bad it can get. So when we have also
[11:45.550 --> 11:54.610]  dev development departments and DevOps processes, where most of these developers have
[11:55.010 --> 12:03.210]  high privilege credentials or permissions, there is obviously a higher risk of insider,
[12:03.210 --> 12:10.890]  because it takes less of an effort to cause harm, to embed malicious stuff in it, or to even destroy
[12:10.890 --> 12:16.870]  it. Like we have seen in some cases before where a person that was part of a development department
[12:16.870 --> 12:26.370]  has come back or a system has come back and do harm to an employer. And another couple of points
[12:26.370 --> 12:37.430]  is that due to the CSD nature, the continuous delivery, basically that cycle that goes around
[12:37.430 --> 12:47.630]  building, testing, developing, and coming back to planning and on software, these things
[12:47.630 --> 12:54.030]  get published immediately. That's one of the nature of the DevOps process, is the DevOps
[12:54.030 --> 13:04.130]  process has shortened the time where you plan, co-build, test, release, basically becomes something
[13:04.130 --> 13:11.350]  almost immediate that goes into production. And this by itself represents a risk, because when you,
[13:11.350 --> 13:20.030]  like I told somebody, when you have vulnerabilities in environments that are
[13:20.610 --> 13:25.330]  driven by CICD and things like, for example, implanted container, and you have very large
[13:25.330 --> 13:31.650]  environments, the risk and the opportunity of exploitation increases in orders of magnitude.
[13:33.290 --> 13:39.750]  So finally, and just to put this, to make this even worse, unfortunately, the cloud
[13:39.750 --> 13:45.050]  environments have made this risk even higher. Why? Because basically you are connected to the
[13:45.050 --> 13:54.430]  internet, you publish right away, you stage right away, and if there are attackers that are
[13:54.430 --> 14:01.090]  knowledgeable and are able to pretty much footprint your process, they can do a lot of harm.
[14:01.090 --> 14:08.010]  Next slide, please. So here's a little bit for you to have a reference of how
[14:08.830 --> 14:16.330]  the cloud providers manage credentials. We're going to focus mostly on cloud-related environments.
[14:16.330 --> 14:24.750]  So here's, for example, AWS, they have their own AIM credential service, which usually
[14:24.750 --> 14:33.850]  have things such as passwords, access keys, key pairs, or SSH keys. And also they do have a number
[14:33.850 --> 14:43.370]  of temporary security credentials that can be created on the go, and that sometimes have a
[14:43.370 --> 14:50.550]  feature, whereas you can give a specific user access to, temporary access to a resource that
[14:50.550 --> 14:58.930]  otherwise that user does not have access for. So next up. Most of the providers have had
[14:58.930 --> 15:07.710]  sort of similar systems for managing credentials. However, I had to give kudos to Microsoft because
[15:07.710 --> 15:14.830]  they are trying actually to tackle this problem. And as you can see here, they do have, this is an
[15:14.830 --> 15:23.850]  example of how they manage credentials. They use the Azure, there's a framework within Azure
[15:24.790 --> 15:35.210]  Active Directory where it tries to avoid the embedding of creds, and they use a different
[15:35.210 --> 15:43.790]  mechanism. We're not going to focus on this feature of Azure in this talk, but it's important for you
[15:43.790 --> 15:49.810]  to consider it and look at it because they definitely seem to be aware of the issue.
[15:49.810 --> 15:59.430]  Next slide please. And then here's, we wanted to give you a little sample of basically what's
[15:59.430 --> 16:07.070]  happening with the three main cloud providers, which is AWS, GCP, and Azure. In the case of
[16:07.070 --> 16:15.270]  GCP or Google Cloud, they are, most of the stuff is based on OAuth, which is a
[16:15.270 --> 16:23.470]  protocol that's used for identity federation and single sign-on. And for the most part,
[16:23.470 --> 16:31.830]  because of the constant interaction and services that are present in cloud environments,
[16:31.830 --> 16:39.190]  obviously these providers have to come up with a way to allow the interaction of either devices,
[16:39.190 --> 16:45.590]  users, and services, dividing and trying to establish boundaries between these entities.
[16:45.590 --> 17:01.810]  And as we will see soon, this is very challenging. Next slide. So here's a general,
[17:01.810 --> 17:11.390]  that are usually used and as such, potentially exposed when you have developers that are
[17:11.390 --> 17:20.630]  publishing or storing code in public repositories. So things as email and password, username and
[17:20.630 --> 17:27.650]  password. Remember, there's a difference between your local, for example, Active Directory or LDAP
[17:27.650 --> 17:37.890]  and the cloud identity access management. Sometimes this brings up a lot of confusion.
[17:40.290 --> 17:46.790]  And depending on the integration that you have with your cloud environment, this may or may not
[17:46.790 --> 17:53.190]  play in your favor. Meaning, if you're not very integrated, losing, for example, the username and
[17:53.190 --> 18:00.250]  password from a specific cloud service will not allow the attacker to access your internal
[18:00.250 --> 18:11.270]  environment. Also, multiple factor authentication is something that's been coming and at times it
[18:11.270 --> 18:19.870]  can be bypassed by certain frameworks. We've done some work before with Evil Jinx, which basically
[18:19.870 --> 18:28.530]  is able to capture the second, the authentication or the TLTP, whatever interface is presented to
[18:28.530 --> 18:34.730]  the user and bypass MFA. Access keys, we talked about it. Key pairs, we talked about it.
[18:34.750 --> 18:43.770]  Specific account identifiers and at times they use x.509 certificates. Next up.
[18:45.150 --> 18:54.110]  So, what are the primary source of leak credentials? Well, as you will see soon,
[18:54.110 --> 19:02.370]  GitHub is probably the most popular code repository on the internet. GitHub is now used for many other
[19:02.370 --> 19:11.870]  things, storing files, even hosting web pages, which is kind of cool. So, GitHub is like the
[19:11.870 --> 19:20.770]  reference when it comes to the leading internet code repository, not only publicly, but many
[19:20.770 --> 19:26.870]  companies use it. And then we also have GitLab and we also have Amazon S3
[19:30.550 --> 19:37.550]  buckets, storages, because the reason why I put this here is because you can definitely search
[19:37.550 --> 19:45.290]  for Amazon S3 buckets that are open or have writing or read privileges. And there's not
[19:45.290 --> 19:54.530]  only data stored in it, but tons of code with possibly embedded keys are usually found in this
[19:54.530 --> 20:03.810]  environment. Next up. All right. So, as I was explaining with GitHub and GitLab and even S3
[20:03.810 --> 20:12.930]  buckets, they're not just the only source of leak credentials. As you can see right now,
[20:12.930 --> 20:21.470]  I basically Google, Google dork, aka, which is usually how Amazon permanent keys start.
[20:21.470 --> 20:27.150]  And I was able to find the snippet. Fortunately, the person that posted this sanitizes keys,
[20:27.150 --> 20:32.850]  but that doesn't mean this happens all the time. And with this, I just wanted to show you an
[20:32.850 --> 20:38.690]  example that it's not just code repositories. It can be anything. I actually had a friend that
[20:38.690 --> 20:47.110]  lost his username and password for his Gmail and it turned into an absolute nightmare.
[20:47.150 --> 20:53.510]  The attackers actually reset everything he had and it took him like a week and even him being
[20:53.510 --> 20:58.790]  part of the community to get a hold of Google in order to reset this. So, please be very careful
[20:58.790 --> 21:06.950]  with these things. And that was a... my friend was working for a very large company, but there
[21:06.950 --> 21:12.030]  are other examples that we're going to show you where the attacker may not be so obvious,
[21:12.030 --> 21:16.330]  yet cause even more damage. So, let's go on the next one.
[21:17.990 --> 21:25.310]  So, before we can continue on this presentation, it is important for you to understand
[21:26.010 --> 21:36.970]  that when we look at the context and nomenclature of attacks on the internet, on the cloud, or inside
[21:36.970 --> 21:44.810]  the perimeter, we always look in a MITRE cloud attack matrix in this case. Like I said before,
[21:44.810 --> 21:50.570]  we were going to focus on cloud-related type of environments. So, in this case, we're talking
[21:50.570 --> 21:56.030]  about basically unsecured credentials. Unsecured credentials, for example, if you leave a credential
[21:56.030 --> 22:02.870]  in an Amazon S3 bucket or it's embedded in some code in GitHub, can lead to things such as what
[22:02.870 --> 22:07.770]  is called valid accounts. And valid accounts can be used for initial access, persistence,
[22:07.770 --> 22:16.030]  lateral movement, and privilege escalation. And I'm going to give you an example of it as we move
[22:16.030 --> 22:22.170]  on in this presentation. So, just keep this in mind. So, let's move on.
[22:24.110 --> 22:36.670]  One of the things that we're looking at here is a technique which is T1078.004,
[22:36.670 --> 22:45.190]  which is valid cloud accounts. So, you're obtaining cloud accounts that basically,
[22:45.190 --> 22:51.450]  in one of the scenarios that we're going to propose, we're able to find access keys that
[22:51.450 --> 22:57.870]  then allow you to not only access the provider, but move laterally and escalate privileges.
[22:59.710 --> 23:08.490]  These attack vectors are real. And many times, they get dismissed because the company,
[23:08.490 --> 23:15.810]  and being honest to you, they do not have an awareness of the reach of the cloud within their
[23:15.810 --> 23:22.330]  perimeters. As we move on in the cloud adoption, there is many hybrid environments. And in these
[23:22.330 --> 23:28.490]  hybrid environments, parts of your cloud infrastructure would allow access to your
[23:28.490 --> 23:34.650]  perimeter, either because the developers do it or because you are in IT operations,
[23:34.650 --> 23:40.730]  and there are some servers that have some access to S3 buckets, for example, or you have a WAN,
[23:40.730 --> 23:46.770]  or you have a cloud VPN. So, these are scenarios that are important to consider that are real.
[23:46.770 --> 23:52.150]  There is the line between the perimeter and the internet gets blurred or even disappear
[23:53.230 --> 23:59.510]  with the adoption of cloud technologies. And here's an example, something that you should read
[23:59.510 --> 24:06.870]  on. I know this is an evolving framework, and we may be giving you a number that will change
[24:06.870 --> 24:12.210]  tomorrow or even the definition of it. But it's important to understand that there is some work
[24:12.210 --> 24:17.710]  associated with these attack vectors and these vulnerabilities. And this is what we're trying to
[24:18.270 --> 24:21.610]  showcase today. Next, please.
[24:22.470 --> 24:33.150]  So here, as I set the stage, I wanted to give you a little bit of an example and something that
[24:33.150 --> 24:39.290]  we are actually working right now, and we will be presenting more research in the future, which is
[24:39.290 --> 24:47.030]  lateral movement and escalation of privilege by simply obtaining keys, right? You can obtain
[24:47.030 --> 24:56.930]  keys. Jose and I did a presentation, which was called Red Teaming DevOps in the Mayan Summit of
[24:56.930 --> 25:03.890]  Red Team Village, where we explained ways of either phishing or finding these credentials,
[25:03.890 --> 25:08.180]  which we're going to show part of it today. But for example, if you were able to get...
[25:09.450 --> 25:14.850]  sure if you remember how I was able to go Google for a permanent keys,
[25:14.850 --> 25:21.550]  AWS, which is started A-K-I-A, you can do things as you can, depending on the actual
[25:23.770 --> 25:31.090]  user, you can even create new trust role policies, you can add yourself to role trust policies. So
[25:31.090 --> 25:37.950]  for example, let's say there was a trust policy that allows you to access certain buckets where
[25:37.950 --> 25:46.330]  there is sensitive data, you will be able to basically access that data by simply starting
[25:46.330 --> 25:53.390]  from the compromise of these keys, or you can create temporary keys by either STS assume role
[25:53.390 --> 26:00.350]  or get session token, it will depend on the policies that are in place and the privilege
[26:00.350 --> 26:09.210]  of the users. However, this is not... the boundaries are not that strict. And because
[26:09.210 --> 26:15.870]  this is an evolving and new technology, it's not that difficult, or far fetched to say that if
[26:15.870 --> 26:22.110]  you're able to get a permanent keys that have been leaked on the internet, you might be able to
[26:22.110 --> 26:28.670]  basically go around in its specific environments. So here's an example how you can have used tokens
[26:29.390 --> 26:37.330]  specifically in AWS by either obtaining permanent keys or compromising a session
[26:37.330 --> 26:44.350]  where they're already using temporary keys, which usually start with ASIA. And then from there,
[26:44.350 --> 26:50.290]  you can have used temporary tokens using things such as assume role. Assume role is a
[26:50.290 --> 26:57.210]  cross account feature given by AWS to, for example, provide temporary access to a user to a
[26:57.210 --> 27:03.490]  resource that he may not have access to, or things like get session token, which are permanent,
[27:03.490 --> 27:10.170]  temporary, sorry, temporary tokens that can be used for specific features. So please be on the
[27:11.210 --> 27:19.590]  lookout for these things because they are... this has not really exploited as publicly available,
[27:19.590 --> 27:26.370]  but it's definitely something that is happening right now. To next, please. And with that,
[27:26.370 --> 27:30.090]  I'm going to pass it to Jose, and we're going to see an awesome demo.
[27:31.610 --> 27:39.510]  Hey, so give me two seconds here. I'm going to go ahead and share my terminal as well as the slides.
[27:42.550 --> 27:47.990]  Let me know when... Rock, can you confirm you can see both the terminal and the slides?
[27:49.850 --> 27:56.730]  Perfect. Okay. Sweet. So I want to do a demo live today, if everyone's okay with that. I don't
[27:56.730 --> 28:01.350]  usually do this, but I'm so confident that... again, I'm so confident about how comfortable
[28:01.350 --> 28:06.450]  this tool runs and finds leaks. I want to do it live. And what I'm going to show in the demo today
[28:06.450 --> 28:14.150]  are basically three basic steps of how you would use GitLab, the tool that we built to find new
[28:14.150 --> 28:18.950]  credentials in GitHub. The first step is going to be installing it and deploying it. The second
[28:18.950 --> 28:23.530]  step is going to be searching for leaked AWS credentials. And then the third step, we're going
[28:23.530 --> 28:29.090]  to just really quickly dig into that data that is generated from one of our searches for hunts.
[28:29.370 --> 28:37.290]  So the tool, I already have the tab here prepared. The tool is on GitHub. Unsurprisingly,
[28:37.830 --> 28:42.250]  literally, we're searching GitHub for leaks, and we're also using GitHub to host our code
[28:42.250 --> 28:48.370]  on their DBS1 GitHub. So I'm just going to go ahead and just clone the project really quickly
[28:48.370 --> 28:57.510]  here. And that's going to bring our project down. I'm just going to get back to our slides.
[28:58.750 --> 29:06.250]  So just a few notes. Again, going back to just... while this is cloning down, to Brad's point,
[29:06.250 --> 29:12.450]  this leaking credential is totally normal. I've actually made these mistakes before. I've been
[29:12.450 --> 29:19.830]  in incident responses where colleagues have made these mistakes. It happens, and there's
[29:19.830 --> 29:24.390]  very few mechanisms today out of the box to protect it, but there are good mitigation mechanisms
[29:24.390 --> 29:28.970]  out there. Again, to Brad's point, it hasn't exploded yet, so it's not actively
[29:29.610 --> 29:34.730]  being mitigated for either. So here we go. We've cloned down GitWell Hunt,
[29:34.730 --> 29:39.450]  and the first step to install it is... I'm going to go prepare a virtual environment,
[29:40.970 --> 29:46.690]  and I've used VirtualM for this. So we're creating a virtual environment in Python 3,
[29:46.690 --> 29:51.250]  and this is just in order for us to install all of our dependencies separated from our system Python.
[29:52.170 --> 29:55.570]  And so the next step here is I'm going to activate my virtual environment and then
[29:56.250 --> 30:04.710]  install my required dependencies for the project. The project is pretty lightweight. There's not a
[30:05.650 --> 30:12.390]  lot of dependencies. And now I have a... yeah, the tool should execute just perfectly fine.
[30:12.390 --> 30:17.350]  And the tool is pretty straightforward. There's a config file that you really don't have to do
[30:17.350 --> 30:27.890]  much with it besides configuring the actual GitHub API token. So now that we've configured our token
[30:27.890 --> 30:33.290]  in there, we should be ready to go. And what I'm going to do here in this case is I'm going to
[30:33.290 --> 30:40.470]  use one of the example searches that the tool already has out of the box. By the way, the tool,
[30:40.470 --> 30:45.250]  actually, before I jump into running it really quickly, let me walk you through how the tool
[30:45.250 --> 30:51.110]  kind of functions. The first function of it is you pass it a search parameter, and this is the
[30:51.110 --> 30:57.810]  equivalent of a GitHub advanced search. And we have a few examples in here, like how to find
[30:57.810 --> 31:06.750]  GCP JWT tokens, AWS API secrets, Azure JWT tokens, so on and so forth. And then once,
[31:06.750 --> 31:12.870]  what it does is it's going to go ahead and search GitHub for files that match these patterns. And
[31:12.870 --> 31:17.710]  then it checks, it reads every single file that returns in the results, and it checks whether
[31:17.710 --> 31:26.710]  that file has a valid credential from a set of regexes. And by the way, huge kudos and credits
[31:26.710 --> 31:31.490]  to Trufflehug, which is the project we're actually borrowing a lot of these regexes from.
[31:31.830 --> 31:35.850]  But it's basically a two-step function tool, right? First, it searches for
[31:36.710 --> 31:39.590]  potentially leaked files, and then it verifies that inside those files,
[31:39.590 --> 31:46.370]  you actually have credentials in them. In this case, again, let's try to run an example for
[31:46.370 --> 31:54.650]  pulling back AWS secrets, right? And so once we run it, the first thing we're going to get back
[31:54.650 --> 32:01.550]  is the total results that we found in GitHub for files that match this. And then you're going to,
[32:01.550 --> 32:05.510]  the tool's going to go ahead again and process every single result, right? And it's printing
[32:05.510 --> 32:12.130]  out here the actual URLs, checking whether there's actual credentials in them. And here we have our
[32:12.130 --> 32:19.330]  first kind of hit from Bill Rosey, where he actually committed an AWS secret. And if we
[32:19.330 --> 32:26.990]  actually go into this URI, we should be able to see his AWA access key. I'm just going to open it
[32:26.990 --> 32:32.670]  here in my browser, and yep, there you go. That's literally his secret key, and it's out of the EU
[32:32.670 --> 32:40.330]  central region. So as I explained earlier, with this key, we literally have the exact same
[32:40.330 --> 32:47.990]  permissions that Bill Rosey had in managing AWS. And so the tool, again, the tool runs,
[32:47.990 --> 32:52.550]  and it's going to take a few minutes because there's 200 something results, right? We're about
[32:52.670 --> 32:59.590]  a leap number five, 95. And it collects all this data and saves it into a JSON file, essentially.
[32:59.830 --> 33:07.690]  Now, while the tool is running, I want to go ahead and show you a bit of... so we grabbed the data
[33:07.690 --> 33:11.670]  out of this JSON file. I'll show you at the end of the execution what it looks like. But we grabbed
[33:11.670 --> 33:17.290]  all that data. We've been collecting these leaks for about seven days now. And I pulled some quick
[33:19.110 --> 33:25.270]  reports on what we've collected so far in the past seven days. And for example, for top leaks
[33:25.270 --> 33:33.070]  or top leaks by technology, we have by far a whole lot of AWS API keys out there that we've collected
[33:33.070 --> 33:41.170]  and second to that is GCP actually service accounts. I couldn't say I wasn't... this did
[33:41.170 --> 33:44.370]  not surprise me. I don't know, Rod, if you got surprised when you saw this data set, but this
[33:44.370 --> 33:49.230]  was kind of somewhat expected. AWS by far is the secret that gets leaked the most.
[33:50.330 --> 33:54.830]  Somewhat surprising, there's still a lot of private keys for ISA out there, which I was not
[33:54.830 --> 34:04.690]  expecting. Another thing that, again, just another curiosity is that AWS secrets, if you break this
[34:04.690 --> 34:12.410]  down by the last seven days, it doesn't vary a whole lot. It seems that by far, again, AWS seems
[34:12.410 --> 34:19.610]  to be the normal thing that gets leaked the most, second to GCP. Although what they call
[34:19.610 --> 34:25.790]  YouTube OAuth tokens or the tool calls YouTube OAuth tokens are actually generic OAuth tokens as
[34:25.790 --> 34:31.630]  well for Google. It just considers it like a specific YouTube one, but it's not. And so you
[34:31.630 --> 34:38.470]  can see that actually varies every so often across days. And so for the most part, people are either
[34:39.090 --> 34:44.470]  leaking their Google suite tokens, which is really what the YouTube OAuth token are,
[34:44.470 --> 34:49.790]  or the GCP tokens, which is pretty bad because, again, now you have full access to whatever they
[34:49.790 --> 34:59.770]  can do with Google on their account. We did a breakdown as well by top companies. And mind you,
[34:59.770 --> 35:06.330]  where this data comes from is that if the user, because we're searching GitHub, and GitHub gives
[35:06.330 --> 35:11.770]  users abilities to input things like what blog they have, what their Twitter handle is, and the
[35:11.770 --> 35:17.010]  profile, and what company they work for, we pull that information back when we have a match,
[35:17.010 --> 35:24.490]  since that's open data. And this is just a very, very ugly picture of what the most credentials
[35:24.490 --> 35:30.750]  by company that we collected were. X by Orange is by far the company that leaked the most,
[35:30.750 --> 35:33.930]  or at least that was labeled to leak the most, but we got some pretty big ones out here like
[35:33.930 --> 35:42.530]  Nordstrom, VMware, and again, these are the top. You see here the other flag that means that
[35:42.530 --> 35:48.250]  there's a bunch of credentials that were leaked, but not necessarily multiple of them. These are
[35:48.250 --> 35:55.710]  multiple credentials for some of these companies. Microsoft was in there. Yeah, pretty bad, at least
[35:55.710 --> 36:03.910]  to say. Yeah, so this is very revealing because many times, and it's like what Jose was just
[36:03.910 --> 36:10.370]  saying, there just doesn't seem to be awareness of how bad this is. And we're trying to show you
[36:10.550 --> 36:18.130]  a picture of we collected, what, a week or so, and look all the stuff we have. And we actually,
[36:18.130 --> 36:26.730]  we basically did this for analytic purposes and to show awareness. We could have
[36:26.730 --> 36:32.270]  delved even deeper into this data, and who knows what we would have been able to get. And if we are
[36:32.270 --> 36:37.210]  doing it, bad guys are doing it too. So that's, that's, this is something that's very remarkable
[36:37.210 --> 36:42.650]  that you see some big names in there. I understand there is a number of mitigation issues, like you
[36:42.650 --> 36:48.350]  can say, well, you can get my key, but if your IP is not in my security group, you will not be
[36:48.350 --> 36:56.310]  able to load again. True, but we don't know that. And the fact that you prove that, or the, those
[36:59.250 --> 37:05.210]  keys are being leaked, it opens the possibility. It's almost like an open port. That's how I see
[37:05.210 --> 37:08.910]  it. It's almost like an open port of a vulnerable application.
[37:10.190 --> 37:15.470]  And one note I wanted to say on this data also, and it was a really good point about this. This,
[37:15.470 --> 37:21.790]  again, because this is what individuals have put in their company profile. We, there's some garbage
[37:21.790 --> 37:28.350]  in here too, like this LinkedIn profile or ABC, which we necessarily haven't cleaned out of our
[37:28.350 --> 37:34.650]  data yet. But this is straight up, again, the users that we collected or we've seen leaks for,
[37:34.650 --> 37:40.150]  what companies have put in their profile. And again, we also kind of want to break this down
[37:40.150 --> 37:47.570]  by region. My favorite region so far that users have listed is in your heart. This is pretty sweet.
[37:50.570 --> 37:54.470]  But unsurprising, and actually, Rod, you made this point to me, unsurprising.
[37:54.670 --> 38:02.070]  There is big cities that have a lot of development footprint, right? Like a lot of developers and a
[38:02.070 --> 38:08.610]  lot of huge tech footprint are showing up here, right? New York, New York, Santa Monica, San
[38:08.610 --> 38:16.550]  Francisco, Sweden, Russia, Pittsburgh, right? So again, the cities that typically have huge
[38:16.550 --> 38:20.930]  companies at the development are a lot of developers. And there, again, the data,
[38:20.930 --> 38:25.970]  we haven't really cleaned up the data, hence why you have in your heart in here. But I just want
[38:25.970 --> 38:30.930]  to give you a really rough view of like, what if we were just trying to aggregate this really,
[38:30.930 --> 38:34.310]  really quickly, and we did it for seven days, what it looks like.
[38:34.310 --> 38:43.410]  Right. And if you were targeting a company, right, which is where you and I were talking yesterday,
[38:43.410 --> 38:50.890]  we can hypothesize that by the number of leaks, and if we reduce that to the regions, because we
[38:50.890 --> 38:55.310]  know where most of the developers are working, even though that's changing a little bit because
[38:55.310 --> 39:00.890]  of the work from home, that sort of cuts off a little bit of the work that an adversary
[39:02.070 --> 39:10.130]  needs to do in order to try to infiltrate one of these big companies that by omission or
[39:10.130 --> 39:19.030]  by or willingly may have developers that are posting keys that are revealing too much.
[39:20.330 --> 39:25.210]  That's true. By the way, our tool finished here. I just want to give
[39:27.210 --> 39:35.930]  the viewers here a quick preview of what the results file gets written as. So if I just,
[39:36.810 --> 39:41.890]  since this is a valid JSON, you can see here the data we're collecting, right? So again,
[39:41.890 --> 39:49.210]  we're dumping every result that's getting matched into an array of JSONs here, or actually
[39:49.210 --> 39:54.830]  various JSON objects, and we collect a URL where we found the leak that checked that it actually
[39:54.830 --> 40:02.270]  matched the different matches, right? So in the matches, I purposely went out of my way to not
[40:02.270 --> 40:10.030]  necessarily store the actual secret, but merely just the keys if possible. The owner, so who owns
[40:10.030 --> 40:15.870]  that repository that leaked that credential, owner URL, the type, again, if it's a company,
[40:15.870 --> 40:22.930]  it's going to get listed as a company. The name, email, again, if they listed the company in their
[40:22.930 --> 40:27.250]  GitHub profile, it's in their blog, the location that they got listed in, this is where this data
[40:27.250 --> 40:32.710]  is coming from over here, Twitter handles, so on and so forth, right? And again, some of these
[40:32.710 --> 40:40.710]  fields tend to be null if the users don't allow their profile in GitHub to show this data. That's
[40:40.710 --> 40:49.670]  the only way we can actually read it. But yeah, again, pretty telling dataset. And with that,
[40:49.670 --> 41:08.800]  I kind of want to talk about one example that stood out to the CEO of a telecom. It was pretty
[41:08.800 --> 41:17.540]  wild that, again, because the CTO had listed on his GitHub profile, his Twitter handle,
[41:17.540 --> 41:23.020]  as well as his personal blog site, we were able to find his LinkedIn and essentially...
[41:23.800 --> 41:28.880]  Pretty much everything. We found pretty much everything about this gentleman. And
[41:31.380 --> 41:38.240]  obviously, we have sanitized everything, so you can't find him yourself. But just like him,
[41:38.240 --> 41:44.580]  there's many, many cases of people that we found. Right, it's just an example.
[41:45.080 --> 41:48.540]  We've contacted them, again, to clean things up. Our intent is to make sure these things stay
[41:48.540 --> 41:55.340]  clean up. Again, we want to bring awareness, as this is a big issue. And again, I've made these
[41:55.340 --> 42:00.900]  mistakes in the past, but this is how bad things can get, essentially. Right?
[42:02.160 --> 42:07.560]  You can go for a simple... It's amazing that you can go for a simple key to pretty much everything,
[42:07.560 --> 42:17.720]  from a leak key to opening the doors of your company, revealing your personal life,
[42:17.720 --> 42:23.760]  personal things. So, please be careful with these things and try to implement some measures.
